Photon Technical Deep Dive: How to Think Vectorized

Created with glancer

project Photon at Databricks do a Technical Deep Dive setting, what Vectorization is work with a few benchmarks you how to build a new engine basic building blocks a little about what motivated trends in two dimensions storage and networks but CPUs not so much. be the bottleneck on some there's a natural question next level of performance, faster and they're loading
the kind of data modeling is pristine and clean whether the columns data types so very often process of one of the
and its full lifecycle from and then converting it silver and gold tables. through this lifecycle and finally producing varying degrees of quality. perfect schemas upfront actually a case to be made
engine that is both good at pristine schema and quality. trends that motivate it. where does Photon fit in? have this Delta Engine
layers at the bottom DataFrames, or Koalas. to show you here is that focused on that part.
these workloads we talked about it's a specific technique structured and semi-structured
query and decomposing it that process vectors of data. at doing this kind of work.
the pipelining of CPUs Columnar in-memory formats, it's kind of a batch-oriented some Runtime Adaptivity about actually later on idea of compute kernels
that tends to be slow, only that specific part. ideas, they all sound great but it sounds a little nebulous build a mental model build a vectorized engine
of this, you might ask, completely from scratch here on a variety of expressions expressions, string expressions
still a lot to be gained. Speedup that you can achieve the range of 50 to 60 X.
Hash tables for aggregations. simple query that can help
the vectorized engine is pass batches of columnar data. portions in this talk.
Evaluation focusing on this c2 is smaller than 10. into an expression tree produce another input vector
the idea of compute kernels compute kernels look like. be implemented by code, code you're seeing here
vector of output items. over all the items, them to the output. what that means for us.
data input and output vectors, that we've added in now whether there's a NULL operation if any of the,
adding in the NULLs makes may sometimes have NULLs there was a way for me to my mostly non-NULL data
whether it has NULLs or not. can use this property to, kernel that can deal with NULLs no NULLS on both sides.
this to only the left expressions and other ideas string and codings or min data but the main idea is allows you to adapt you're seeing at runtime on information that you have
inside of the schema. because users don't need to benefits because we do it step, the inequality we can do
another optimization where, we can kind of bake it in opposed to passing a vector. optimization allows us to get way through and evaluated it
that we were processing? evaluated as a new kind
an Active Row Spectrum. a Lazy Representation just indicating which c1, c2, c3, g1 and g2 just tack on this active row including the original data
to the next operator. is there are several benefits them is that you don't have to would be kind of expensive. helpful in more complicated
representation of filters? a concept in the engine throughout the engine handy way of expressing more complex operations like kernels is, if we take a look
the rows that are active to move on to aggregation.
grouping by g1 and g2 table that has these can think of g1 and g2 it with the keys and
we've found the right bucket evaluate the aggregation function aggregation buffers. pretty simple algorithm implement in a row-wise system
is to think in a columnar column-oriented and batch-oriented. basically go through all kernelize all the things what that means exactly.
we see that there are, to be achieved with this kind even for a very simple query
nice Speedups so pay attention. batch with g1 well only they're shown g1, g2 and c3, I've omitted the filter stuff, because it just
take g1 and g2 and compute, computing, the hashes produce a vector of hashes. entries into the hash table
the hashes, we will that contains pointers that we think might be a match
identified candidate buckets buckets to identify collisions. in a column by column way the corresponding buckets
are pointing to buckets about these collision cases these non-matches is
linear probing strategy and we'll repeat the step identifying the non-matches
all of the bucket pointers the c3 input vector corresponding aggregation buffers
kernels might look like Snippet for us to walk through. engine and in columnar dealing with hash tables represent them as rows. Mixed Column/Row Kernel.
values and a representation whose values are sprayed it's pretty straightforward level of the indirection
aggregation function input and the bucket dealing with code like this that might become large
by cache miss latencies. that we've seen some of the End-to-End Performance.
techniques we've just discussed improvement in throughput benchmark end-to-end. I've shown you so far to focus on read queries
chart is showing you integer, so a single column achieve a Speedup of, the TPC-DS store sales table
who were interested have a suitable workloads end-to-end for suitable queries
that we'll be able to bit about the basic blocks
because CPUs are very good batch-level adaptivity data you're seeing at runtime about a Lazy filter
operations for hash tables.